Take-Home Exercise 3: Predicting HDB Public Housing Resale Pricies using Geographically Weighted Methods

Author

Wong Kelly

Modified

March 19, 2023

1. Overview

Housing is an essential component of household wealth worldwide. Buying a housing has always been a major investment for most people. The price of housing is affected by many factors. Some of them are global in nature such as the general economy of a country or inflation rate. Others can be more specific to the properties themselves. These factors can be further divided to structural and locational factors. Structural factors are variables related to the property themselves such as the size, fitting, and tenure of the property. Locational factors are variables related to the neighbourhood of the properties such as proximity to childcare centre, public transport service and shopping centre.

Conventional, housing resale prices predictive models were built by using Ordinary Least Square (OLS) method. However, this method failed to take into consideration that spatial autocorrelation and spatial heterogeneity exist in geographic data sets such as housing transactions. With the existence of spatial autocorrelation, the OLS estimation of predictive housing resale pricing models could lead to biased, inconsistent, or inefficient results (Anselin 1998). In view of this limitation, Geographical Weighted Models were introduced for calibrating predictive model for housing resale prices.

1.1 The Task

In this take-home exercise, we are tasked to predict HDB resale prices at the sub-market level (i.e. HDB 3-room, HDB 4-room and HDB 5-room) for the month of January and February 2023 in Singapore. The predictive models must be built by using by using conventional OLS method and GWR methods. We are also required to compare the performance of the conventional OLS method versus the geographical weighted methods.

2. Data Acquisition Source

For the purpose of this take-home exercise, HDB Resale Flat Prices provided by Data.gov.sg should be used as the core data set. The study should focus on either three-room, four-room or five-room flat and transaction period should be from 1st January 2021 to 31st December 2022. The test data should be January and February 2023 resale prices.

Data summary table

Type Name Format Source
Aspatial HDB Resale Flat Prices .csv data.gov.sg
Geospatial Master Plan 2014 Subzone Web Boundary .shp data.gov.sg
Geospatial (locational factor) Eldercare Service .shp data.gov.sg
Geospatial (locational factor) Park & Open Space .shp data.gov.sg
Geosaptial (locational factor) Supermarket .csv data.gov.sg
Geospatial (locational factor) Hawker Centre .geoson data.gov.sg
Geospatial (locational factor) Bus Stop Location .shp LTA Data Mall

**Find more geospatial (locational factors) data

3. Getting Started

3.1 Installing and Loading the R packages

pacman::p_load(olsrr, ggpubr, sf, spdep, GWmodel, tmap, tidyverse, gtsummary, SpatialML,rsample,Metrics, jsonlite,httr,rvest,sp)

The R packages installed that we will be using for analysis are:

**(to be completed)

4. Data Wrangling: Geospatial Data & Aspatial Data

4.1 Importing Aspatial Data

Import Data

resale <- read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")

Display Data

glimpse(resale)
Rows: 148,000
Columns: 11
$ month               <chr> "2017-01", "2017-01", "2017-01", "2017-01", "2017-…
$ town                <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO …
$ flat_type           <chr> "2 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", …
$ block               <chr> "406", "108", "602", "465", "601", "150", "447", "…
$ street_name         <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 4", "ANG MO K…
$ storey_range        <chr> "10 TO 12", "01 TO 03", "01 TO 03", "04 TO 06", "0…
$ floor_area_sqm      <dbl> 44, 67, 67, 68, 67, 68, 68, 67, 68, 67, 68, 67, 67…
$ flat_model          <chr> "Improved", "New Generation", "New Generation", "N…
$ lease_commence_date <dbl> 1979, 1978, 1980, 1980, 1980, 1981, 1979, 1976, 19…
$ remaining_lease     <chr> "61 years 04 months", "60 years 07 months", "62 ye…
$ resale_price        <dbl> 232000, 250000, 262000, 265000, 265000, 275000, 28…

From the results above, we can tell that:

  • The dataset contains 11 columns with 148,000 rows in total.

  • The timeframe of the dataset is from 2017 January to 2023 February up to date.

  • The columns that are present in the data are: month, town, flat_type, block, street_name, storey_range, floor_area_sqm, flat_model, lease_commence_date, remaining_lease, resale_price.

In this take-home assignment, I selected HDB 4-room flat resale prices to analyse during the transaction period from 1st January 2021 to 31st December 2022. Therefore, we will need to filter and only extract data during this period of time frame.

4.1.1 Filter Resale Data (ONLY interested transaction period)

Filter Data

rs_subset <-  filter(resale,flat_type == "4 ROOM") %>% 
              filter(month >= "2021-01" & month <= "2022-12")

Display Date

glimpse(rs_subset)
Rows: 23,657
Columns: 11
$ month               <chr> "2021-01", "2021-01", "2021-01", "2021-01", "2021-…
$ town                <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO …
$ flat_type           <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", …
$ block               <chr> "547", "414", "509", "467", "571", "134", "204", "…
$ street_name         <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 10", "ANG MO …
$ storey_range        <chr> "04 TO 06", "01 TO 03", "01 TO 03", "07 TO 09", "0…
$ floor_area_sqm      <dbl> 92, 92, 91, 92, 92, 98, 92, 92, 92, 92, 92, 109, 9…
$ flat_model          <chr> "New Generation", "New Generation", "New Generatio…
$ lease_commence_date <dbl> 1981, 1979, 1980, 1979, 1979, 1978, 1977, 1978, 19…
$ remaining_lease     <chr> "59 years", "57 years 09 months", "58 years 06 mon…
$ resale_price        <dbl> 370000, 375000, 380000, 385000, 410000, 410000, 41…

From the results above, we can tell that:

  • We have successfully filtered our data based on earlier chosen HDB model flat and transaction period!

  • From January 2021 to December 2022, there are 23,657 transactions for 4-room flat in Singapore.

4.1.2 Transform Resale Data

After we have extracted the rows of transactions we are interested in, we will then proceed to use mutate function of dplyr package to create new variables (columns) in a data frame by applying some transformations to the existing columns.

What we will need to do is:

  • address: concatenation of the block and street_name columns using paste() function of base R package.

  • remaining_lease_yr & remaining_lease_mnth: Split the year and months part of the remaining_lease respectively using str_sub() function of stringr package then converting the character to integer using as.integer() function of base R package.

  • After performing mutate function, we will store the new data in rs_transform.

rs_transform <- rs_subset %>%
  mutate(rs_subset, address = paste(block,street_name)) %>%
  mutate(rs_subset, remaining_lease_yr = as.integer(str_sub(remaining_lease, 0, 2))) %>%
  mutate(rs_subset, remaining_lease_mnth = as.integer(str_sub(remaining_lease, 9, 11)))

After we have successfully added the three variables (address, remaining_lease_yr, and remaining_lease_mnth) into a new data named rs_transform, we will see some NA values in the remaining_lease_mnth column. Therefore, we will need to replace those with a value of 0 using is.na() function of base R package.

rs_transform$remaining_lease_mnth[is.na(rs_transform$remaining_lease_mnth)] <- 0
rs_transform
# A tibble: 23,657 × 14
   month   town    flat_…¹ block stree…² store…³ floor…⁴ flat_…⁵ lease…⁶ remai…⁷
   <chr>   <chr>   <chr>   <chr> <chr>   <chr>     <dbl> <chr>     <dbl> <chr>  
 1 2021-01 ANG MO… 4 ROOM  547   ANG MO… 04 TO …      92 New Ge…    1981 59 yea…
 2 2021-01 ANG MO… 4 ROOM  414   ANG MO… 01 TO …      92 New Ge…    1979 57 yea…
 3 2021-01 ANG MO… 4 ROOM  509   ANG MO… 01 TO …      91 New Ge…    1980 58 yea…
 4 2021-01 ANG MO… 4 ROOM  467   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
 5 2021-01 ANG MO… 4 ROOM  571   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
 6 2021-01 ANG MO… 4 ROOM  134   ANG MO… 07 TO …      98 New Ge…    1978 56 yea…
 7 2021-01 ANG MO… 4 ROOM  204   ANG MO… 07 TO …      92 New Ge…    1977 55 yea…
 8 2021-01 ANG MO… 4 ROOM  429   ANG MO… 04 TO …      92 New Ge…    1978 56 yea…
 9 2021-01 ANG MO… 4 ROOM  413   ANG MO… 10 TO …      92 New Ge…    1979 57 yea…
10 2021-01 ANG MO… 4 ROOM  415   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
# … with 23,647 more rows, 4 more variables: resale_price <dbl>, address <chr>,
#   remaining_lease_yr <int>, remaining_lease_mnth <dbl>, and abbreviated
#   variable names ¹​flat_type, ²​street_name, ³​storey_range, ⁴​floor_area_sqm,
#   ⁵​flat_model, ⁶​lease_commence_date, ⁷​remaining_lease

Now, as we scroll to the remaining_lease_mnth column, we noticed all initial “NA” values have been replaced by 0!

Next, we do not want to segregate the remaining lease in years and months columns. Instead, we could convert the remaining_lease_yr to months unit and create a new column call total_remaining_lease for easier analysis later using mutate function of dplyr package which contains the summation of the remaining_lease_yr and remaining_lease_mnth using rowSum() function of base R package.

Multiply remaining_lease_yr column in months unit

rs_transform$remaining_lease_yr <- rs_transform$remaining_lease_yr * 12

Create a new column: total_remaining_lease to contain the summation of yr and mnth

rs_transform <- rs_transform %>% 
  mutate(rs_transform, total_remaining_lease = rowSums(rs_transform[, c("remaining_lease_yr", "remaining_lease_mnth")])) %>%
  select(month, town, address, block, street_name, flat_type, storey_range, floor_area_sqm, flat_model, 
         lease_commence_date, total_remaining_lease, resale_price)

Display head of data

head(rs_transform)
# A tibble: 6 × 12
  month   town     address block stree…¹ flat_…² store…³ floor…⁴ flat_…⁵ lease…⁶
  <chr>   <chr>    <chr>   <chr> <chr>   <chr>   <chr>     <dbl> <chr>     <dbl>
1 2021-01 ANG MO … 547 AN… 547   ANG MO… 4 ROOM  04 TO …      92 New Ge…    1981
2 2021-01 ANG MO … 414 AN… 414   ANG MO… 4 ROOM  01 TO …      92 New Ge…    1979
3 2021-01 ANG MO … 509 AN… 509   ANG MO… 4 ROOM  01 TO …      91 New Ge…    1980
4 2021-01 ANG MO … 467 AN… 467   ANG MO… 4 ROOM  07 TO …      92 New Ge…    1979
5 2021-01 ANG MO … 571 AN… 571   ANG MO… 4 ROOM  07 TO …      92 New Ge…    1979
6 2021-01 ANG MO … 134 AN… 134   ANG MO… 4 ROOM  07 TO …      98 New Ge…    1978
# … with 2 more variables: total_remaining_lease <dbl>, resale_price <dbl>, and
#   abbreviated variable names ¹​street_name, ²​flat_type, ³​storey_range,
#   ⁴​floor_area_sqm, ⁵​flat_model, ⁶​lease_commence_date

Upon inspection of the rs_transform, we now only left with one column: total_remaining_lease that contains all the remaining lease in months!

4.1.3 Retrieve Postal Codes and Coordinates of Addresses

In this section, we will focus on retrieving the relevant data like postal codes and coordinates of the address which is required to get the proximity to locational factors in the later parts.

Here are the steps to add its longitude and latitude features with OneMapSG API!

Step 1: Create a list storing unique addresses

add_list <- sort(unique(rs_transform$address))

Step 2: Create function to retrieve coordinates from OneMapSG API

(Add explanation of code here)

get_coords <- function(add_list){
  
  # Create a data frame to store all retrieved coordinates
  postal_coords <- data.frame()
    
  for (i in add_list){

    r <- GET('https://developers.onemap.sg/commonapi/search?',
           query=list(searchVal=i,
                     returnGeom='Y',
                     getAddrDetails='Y'))
    
    # Send a GET request to OneMap API with address as searchVal,
    # returnGeom as 'Y' to retrieve the coordinates, and getAddrDetails as 'Y' to retrieve the postal code

    
    data <- fromJSON(rawToChar(r$content))
    found <- data$found
    res <- data$results
    
    # Extract the 'found' and 'results' fields from the API reponses
    
    # Create a new data frame for each address
    new_row <- data.frame()
    
    # If single result, append 
    if (found == 1){
      postal <- res$POSTAL 
      lat <- res$LATITUDE
      lng <- res$LONGITUDE
      new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
    }
    
    # If multiple results, drop NIL and append top 1
    else if (found > 1){
      # Remove those with NIL as postal
      res_sub <- res[res$POSTAL != "NIL", ]
      
      # Set as NA first if no Postal
      if (nrow(res_sub) == 0) {
          new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
      }
      
      else{
        top1 <- head(res_sub, n = 1)
        postal <- top1$POSTAL 
        lat <- top1$LATITUDE
        lng <- top1$LONGITUDE
        new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
      }
    }

    else {
      new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
    }
    
    # Add the row
    postal_coords <- rbind(postal_coords, new_row)
  }
  return(postal_coords)
}

Step 3: Call get_coords function to retrieve resale coordinates

coords <- get_coords(add_list)
coords$postal[is.na(coords$postal)]
character(0)
coords$latitude[is.na(coords$latitude)]
character(0)
coords$longitude[is.na(coords$longitude)]
character(0)

Step 4: Combine resale and coordinates data

rs_coords <- left_join(rs_transform, coords, by = c('address' = 'address'))

Step 5: Write file to rds

Now that out subset resale dataset is completed with the coordinates, we can now save it as an rds file!

rs_coords_rds <- write_rds(rs_coords, "data/aspatial/rds/rs_coords.rds")
rs_coords <- read_rds("data/aspatial/rds/rs_coords.rds")
rs_coords_sf <- st_as_sf(rs_coords,
                    coords = c("longitude", 
                               "latitude"),
                    crs=4326) %>%
  st_transform(crs = 3414)
length(which(st_is_valid(rs_coords_sf) == FALSE))
[1] 0
tmap_mode("view")
tm_shape(rs_coords_sf)+
  tm_dots(col="red", size = 0.02)
tmap_mode("plot")

5. Import Locational Factors Data

5.1 Locational Factors with Geographic Coordinates

Step 1: Read and check CRS of locational factors

hawker_sf <- st_read("data/geospatial/hawker-centres/hawker-centres-geojson.geojson")
Reading layer `hawker-centres-geojson' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial\hawker-centres\hawker-centres-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 125 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6974 ymin: 1.272716 xmax: 103.9882 ymax: 1.449217
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
elder_sf <- st_read(dsn = "data/geospatial/eldercare-services", layer="ELDERCARE")
Reading layer `ELDERCARE' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial\eldercare-services' 
  using driver `ESRI Shapefile'
Simple feature collection with 133 features and 18 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21
bus_sf <- st_read(dsn = "data/geospatial/BusStopLocation", layer="BusStop")
Reading layer `BusStop' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial\BusStopLocation' 
  using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
st_crs(hawker_sf)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(elder_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(bus_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

Step 2: Assign EPSG code to sf dataframe and check again

elder_sf <- st_set_crs(elder_sf, 3414)
bus_sf <- st_set_crs(bus_sf, 3414)

hawker_sf <- hawker_sf %>%
  st_transform(crs = 3414)
st_crs(hawker_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(elder_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(bus_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Step 3: Check for invalid geometries

length(which(st_is_valid(elder_sf) == FALSE))
[1] 0
length(which(st_is_valid(hawker_sf) == FALSE))
[1] 0
length(which(st_is_valid(bus_sf) == FALSE))
[1] 0

5.2 Proximity Distance Calculation

Step 1: Create get_prox function to calculate proximity

get_prox <- function(df1, df2, varname){
  
  # creates a matrix of distances
  dist_matrix <- st_distance(df1, df2)           
  
  # find the nearest location_factor and create new data frame
  near <- df1 %>% 
    mutate(PROX = apply(dist_matrix, 1, function(x) min(x)) / 1000) 
  
  # rename column name according to input parameter
  names(near)[names(near) == 'PROX'] <- varname

  # Return df
  return(near)
}

Step 2: Call get_prox function

rs_coords_sf <- get_prox(rs_coords_sf, elder_sf, "PROX_ELDERLYCARE") 
rs_coords_sf <- get_prox(rs_coords_sf, hawker_sf, "PROX_HAWKER") 

5.3 Count the number of locational factors within the proximity

Step 1: Create get_within function to calculate no.of factors within distance

get_within <- function(df1, df2, threshold_dist, varname){
  
  # creates a matrix of distances
  dist_matrix <- st_distance(df1, df2)   
  
  # count the number of location_factors within threshold_dist and create new data frame
  wdist <- df1 %>% 
    mutate(WITHIN_DT = apply(dist_matrix, 1, function(x) sum(x <= threshold_dist)))
  
  # rename column name according to input parameter
  names(wdist)[names(wdist) == 'WITHIN_DT'] <- varname

  # Return df
  return(wdist)
}

Step 2: Call get_within function

rs_coords_sf <- get_within(rs_coords_sf, bus_sf, 350, "WITHIN_350M_BUS")